class: center, middle, inverse, title-slide .title[ # Joint Modeling of Longitudinal Imaging
and Survival Data ] .subtitle[ ## Kai Kang & Xin Yuan Song ] .author[ ### Logan Harris & Alan Arakkal ] .date[ ### 2023-04-26 ] --- class: outline # Outline - Background - Joint modeling - Dimension reduction of MRI images - Estimation - Data analysis - Conclusions --- # Background **Alzheimer’s disease (AD)** - Characterized by cognitive deficits - Gradual changes in cognition occurs with normal aging, so it’s difficult to identify during pre-clinical phase and distinguish from mild cognitive impairment (MCI). - As the disease progresses, the decline in cognition accelerates. - Little is known when these changes in cognition occurs or when they are distinguishable from normal aging. - Detection of pre-clinical dementia is important for treatment and outcomes. --- # Background <img src="data:image/png;base64,#images/fig1.jpg" width="50%" height="50%" style="float:right; padding:10px" /> - There is evidence to indicate that dynamic changes of specific brain regions over time may be likely associated with AD progression. - For example, subjects with MCI to AD conversion appear to be associated with increasingly enlarged ventricles across time, whereas subjects who did not progress to AD remained nearly unchanged. --- # Research Objectives - Investigate the association between the longitudinal feature of MRI images and time to AD. - Develop a prognostic model for clinicians to plan interventions based on the history of patients’ MRI images. --- # Data - Alzheimer’s Disease Neuroimaging Initiative (ADNI) study - Overall objective of detecting AD at the earliest stage and identifying ways to track the disease progression. - Recruited subjects aged 55 to 90 and collected their imaging, genetic, clinical, cognitive, and biochemical characteristics across the long-term study. - Subjects could develop AD during the study period, so time-to-AD information was also collected. --- # MRI Data - The preprocessed magnetic resonance imaging (MRI) data were obtained for each participant at several time points. - The image data for a given individual at a given time point can be stored in a voxel (i.e., 3D array structure) of dimension `\(p= p_1 \times p_2 \times p_3\)` - Voxel dimensions are the same across subjects and visits. - Images are voxel-wise stored, i.e., unfolded into a `\(p\times1\)` vector containing the voxels in a particular order, where the order is preserved across all subjects and visits. --- # Complexities - **Joint modeling** - Need model that incorporates dynamic variations/changing patterns of brain MRI images over time along with time-to-AD event information. - **Ultrahigh-dimensional images** - MRI images are of dimension 133 × 170 × 129, so `\(p\)`=2,916,690 - Current joint modeling approaches have not considered settings when the longitudinal observations are Ultrahigh-dimensional. - **Conducting dynamic prediction** - Existing prediction procedures only accommodate longitudinal low-dimensional observations. --- # Joint Modeling - The general concept is rather simple: 1. We have longitudinal data 2. We have survival data 3. We want to model them in a way that the models borrow information from one another - Specifically, the models depend on a common set of latent random effects - Although, the set of shared features could be expanded - Longitudinal data can inform survival outcome by providing time dependent information - e.g. Trajectory of ventricles size likely provides information about time to AD - Survival data can inform longitudinal outcome by accounting for informative censoring - e.g. Missed visits due to progression to AD likely provides information on potentially unobserved trajectories of changes in the ventricles --- # Survival Data Background **Censoring:** - What makes survival data unique is *Censoring* - *Censoring* occurs when we do not observe a subject until they experience the event of interest - May occur due to study ending, death due to another cause, or unrelated circumstances **Notation:** - Let `\(T_i^*\)` be the actual survival time, which may or may not be observed due to censoring - The reason for `\(^*\)` is to distinguish this time from the measurement times in the longitudinal data - Let `\(C_i\)` be the censoring time - Define `\(R_i\)` = `\(min(T_i^*, C_i)\)`, we call this the *observed time* - Finally, define `\(\Delta_i = I(T_i^* \leq C_i)\)` to be the indicator of if the event was observed or not --- # Modeling Survival Data - As the name might imply, modeling survival data relates the survival distribution (S(t)) - However, usually, we work with the hazard instead of directly with S(t) - The hazard function represents the instantaneous failure rate at time `\(t\)` - More specifically, it is the probability that a subject will fail at time `\(t\)` given that `\(T^* \geq t\)` - `\(h(t) = \lim_{\delta \rightarrow 0} \frac{P(t + \delta > T^* \geq t | T^* \geq t)}{\delta}\)` - `\(h(t)\)` relates to `\(S(t)\)` by `\(h(t) = \frac{f(t)}{S(t)}\)` where `\(f(t) = -\frac{d}{dt}S(t) =\frac{d}{dt}F(t)\)` - The hazard is often modeled using the Cox PH model: `$$h(t|Z_i) = h_0(t)exp \Big[Z_i \gamma \Big]$$` - The above includes only constant covariates - Consider the following extension: $$ h(t|\boldsymbol{X}_i(t), Z_i) = h_0(t)exp \Big[\boldsymbol{X}_i(t)\boldsymbol{\beta} + Z_i \gamma \Big] $$ - `\(\boldsymbol{X}(t)\)` is currently an abuse of notation - Purely used to align with how the notation is presented in the paper --- # Back to Longitudinal - Now that we have an idea of the form of the survival model, how will we model the images? - Model `\(Y_{ij}\)` with a mixed effects model with a random intercept and slope - Measured at times `\(T_{ij}\)`: `$$Y_{ij}=\mu_i+X_{i0}+X_{i1}T_{ij}+W_{ij}$$` - `\(Y_{ij}\)` is the value actually observed at time `\(T_{ij}\)` - `\(\mu_i\)` represents the fixed effects - `\(X_{i0}\)` is a random intercept, and `\(X_{i1}\)` a random slope - `\(W_{ij}\)` is an error term representing subject-visit specific deviation **Note: that the terms in the model above can be generalized** --- # Extension to Imaging Data - First, let `\(\mu_i = \mu\)` `$$Y_{ij} = \mu + X_{i0} + X_{i1}T_{ij} + W_{ij}$$` - Then note that we can model each voxel at a given visit `\(j\)`: `$$Y_{ij}(v) = \mu(v) + X_{i0}(v) + X_{i1}(v)T_{ij} + W_{ij}(v)$$` - This simple change in notation introduces a shift in the modeling framework, as this is now a functional mixed effects model - The details of the functional form will be covered later - `\(X_{i0}(v), X_{i1}(v), W_{ij}(v)\)` are all assumed to have zero mean, be square-integrable, and mutaually uncorrelated on `\(\mathcal{V}\)` --- # Combining Models: The Survival model - Recall the form presented previously: $$ h(t|\boldsymbol{X}_i(t), Z_i) = h_0(t)exp \Big[\boldsymbol{X}_i(t)\boldsymbol{\beta} + Z_i \gamma \Big] $$ - We will leave the fixed covariates as is, however, we will include the **REs** from the longitudinal model above in place of the time varying effects - With that said, hazard is a univariate measure, and we have brain scan data that is of much higher dimension - Integrate over this information `$$h(t|X_{i0}(v), X_{i1}(v), \boldsymbol{Z}_i) = h_0(t) exp \Big[ \int_{V} \lbrace \beta_0(v)X_{i0}(v) + t\beta_1(v)X_{i1}(v)\rbrace dv + \gamma'Z_i \Big]$$` - This form for a joint model is similar to that provided Tsiatis and Davidian (2004) - Henderson, Diggle, and Dobson (2000) provides a more general form --- # The Likelihood - Okay, great, but how do we actually fit this model? - To solve this problem, we need a likelihood - Let `\(\boldsymbol{\theta}\)` be the vector containing all unknown parameters - Then the general form of our likelihood is: `$$p(\boldsymbol{R}, \boldsymbol{\Delta}, \tilde{\boldsymbol{Y}}, \boldsymbol{Z} | \boldsymbol{\theta})= \prod_{i=1}^I \int p(R_i, \Delta_i | \boldsymbol{X}_i(v), \boldsymbol{Z}_i, \boldsymbol{\theta})p(\boldsymbol{Y}_i|\boldsymbol{X}_i(v), \boldsymbol{\theta})p(\boldsymbol{X}_i(v)|\boldsymbol{\theta})d\boldsymbol{X}_i(v)$$` - In general, the arrival at a likelihood is not trivial - Tsiatis and Davidian (2004) covers the formulation of the likelihood which depends on the data itself and the modeling assumptions - Due to the treatment of high dimensional data, the likelihood (to be presented) can be expressed in a form similar to that above, but does not have an explicit form --- # PCA and Image Dimension Reduction : Basic Idea - The main idea is to use PCA to obtain low rank approximations of images. <img src="data:image/png;base64,#images/fig2a.jpg" width="50%" style="float:right; padding:20px" /> - Take for example this image: - This gray scale image is a 648 `\(\times\)` 600 image (i.e., p = 388,800) --- # PCA and Image Dimension Reduction : Basic Idea <img src="data:image/png;base64,#images/fig2b.jpg" width="40%" style="float:right; padding:20px" /> - Implement PCA using singular value decomposition (SVD). - Plot normalized eigenvalue by singular value (in this case there are 600 total singular values). - After about the first 20−25 singular values, it falls off and the bulk of it is so low, that any information it contains is negligible. --- # PCA and Image Dimension Reduction : Basic Idea - Image recreated by using the first 10 singular values (left) vs. first 50 (right) - Essence of the image is basically captured in just 10 singular values out of a total of 600. - Increasing to the first 50 singular values shows that the picture is almost exactly reproduced (to the human eye). Furthermore, only using the first 50 singular values essentially cuts down the dimensions of the original image by 84%, while still preserving almost all the information from the original image. <img src="data:image/png;base64,#images/fig2c.jpg" width="70%" style="display: block; margin: auto;" /> --- # PCA and Image Dimension Reduction : Basic Idea - Looking at the next 100 singular values (left), we see some fine structure, especially around the feathers, which are generally indistinguishable to the naked eye. - The smallest 300 singular values (right) convey no real information, most likely just noise. <img src="data:image/png;base64,#images/fig2d.jpg" width="70%" style="display: block; margin: auto;" /> --- # KL Decomposition - Let `\(\mathrm{\bf{K}}^{X0}(v_1, v_2), \mathrm{\bf{K}}^{X1}(v_1, v_2), \text{ and } \mathrm{\bf{K}}^{W}(v_1, v_2)\)` denote the covariance operator of `\(X_{i0}(v), X_{i1}(v)\)`, and `\(W_{ij}(v)\)`, respectively. - Based on KL decomposition, we have: - `\(X_{i0}(v) = \sum^{\infty}_{k=1} \xi_{ik} \psi_{k}^{X0}(v)\)` - `\(X_{i1}(v) = \sum^{\infty}_{l=1} \zeta_{il} \psi_{l}^{X1}(v)\)` - `\(W_{ij}(v) = \sum^{\infty}_{m=1} \eta_{ijm} \psi_{m}^{W}(v)\)` - Where, `\(\psi_{k}^{X0}(.), \psi_{l}^{X1}(.),\)` and `\(\psi_{m}^{W}(.)\)` are the eigenfunctions of `\(\mathrm{\bf{K}}^{X0}, \mathrm{\bf{K}}^{X1}\)`, and `\(\mathrm{\bf{K}}^{W}\)`, respectively. Additionally, `\(\xi_{ik}, \zeta_{il},\)` and `\(\eta_{ijm}\)` are their corresponding eigenscores with: - `\(E(\xi_{ik}) = E(\zeta_{il}) = E(\eta_{ijm}) = 0\)` - `\(Var(\xi_{ik}) = \lambda_k^{X0}\)` - `\(Var(\zeta_{il}) = \lambda_l^{X1}\)` - `\(Var(\eta_{ijm}) = \lambda_m^{W}\)` --- # KL Decomposition Recall the longitudinal model described earlier: `$$Y_{ij}(v) = \mu(v) + X_{i0}(v) + X_{i1}(v)T_{ij} + W_{ij}(v) \\$$` Now it can be rewritten as `$$Y_{ij}(v)=\mu(v) +\sum^{\infty}_{k=1} \xi_{ik} \psi_{k}^{X0}(v) + T_{ij}\sum^{\infty}_{l=1} \zeta_{il} \psi_{l}^{X1}(v)+\sum^{\infty}_{m=1} \eta_{ijm} \psi_{m}^{W}(v) \\$$` The model above can be approximated by the first few coordinates (i.e., principal components) (i.e., `\(N_0 , N_1,\)` and `\(N_W\)`) that are typically small and determined using criterion-based methods (e.g. BIC). Giving us the eigenimages (i.e., low rank approximations of images). `$$Y_{ij}(v)=\mu(v) +\sum^{N_0}_{k=1} \xi_{ik} \psi_{k}^{X0}(v) + T_{ij}\sum^{N_1}_{l=1} \zeta_{il} \psi_{l}^{X1}(v)+\sum^{N_W}_{m=1} \eta_{ijm} \psi_{m}^{W}(v) \\$$` - Where the eigenscores are assumed to be normally distributed as `\(\xi_{ik} \sim N(0,\lambda_K^{X0})\)`, `\(\zeta_{il} \sim N(0,\lambda_l^{X1})\)`, and `\(\eta_{ijm} \sim N(0,\lambda_m^{W})\)` (Note: the normality assumption on the prior distributions of the eigenscores can be relaxed without difficulty. Other distributions with the existence of second order moments can be considered.) --- # KL Decomposition Recall the Cox model described earlier: `$$h(t|X_{i0}(v), X_{i1}(v), \boldsymbol{Z}_i) = h_0(t) \exp \Big[ \int_{V} \lbrace \beta_0(v)X_{i0}(v) + t\beta_1(v)X_{i1}(v)\rbrace dv + \boldsymbol{\gamma}^\prime\boldsymbol{Z}_i \Big]$$` Suppose `\(\beta_0(v)\)` and `\(\beta_1(v)\)` can be expanded on the respective eigenfunctions `\(\psi_k^{X0}\)` and `\(\psi_l^{X1}\)`, then `\(\beta_0(.)\)` and `\(\beta_1(.)\)` can also be approximated by `\(\sum^{N_0}_{k=1}\beta_{0k}\psi_k^{X0}(v)\)` and `\(\sum^{N_1}_{l=1}\beta_{1l}\psi_l^{X1}(v)\)`. This gives us the proposed model: `$$\begin{aligned} h(t&|X_{i0}(v), X_{i1}(v), \boldsymbol{Z}_i) \\&= h_0(t) \exp \Big(\sum^{N_0}_{k=1}\beta_{0k}\xi_{ik}(v) + t\sum^{N_0}_{l=1}\beta_{1l}\zeta_{il}+\boldsymbol{\gamma}^\prime \boldsymbol{Z}_i \Big)\\ &= h_0(t) \exp \Big(\boldsymbol{\beta}_0^\prime \boldsymbol{\xi}_i + t\boldsymbol{\beta}_1^\prime \boldsymbol{\zeta}_i+\boldsymbol{\gamma}^\prime \boldsymbol{Z}_i \Big) \end{aligned}$$` where `\(\boldsymbol{\beta}_0 = (\beta_{01},...,\beta_{0N_0})^\prime\)`, `\(\boldsymbol{\xi}_i = (\xi_{i1},...,\xi_{iN_0})^\prime\)`, `\(\boldsymbol{\beta}_1 = (\beta_{11},...,\beta_{1N_1})^\prime\)`, and `\(\boldsymbol{\zeta}_i = (\zeta_{i1},...,\zeta_{iN_1})^\prime\)` --- # KL Decomposition : General Idea <img src="data:image/png;base64,#images/fig3.jpg" width="100%" height="100%" style="float:center; padding:10px" /> --- # The Likelihood - Let `\(\boldsymbol{D} = \lbrace \boldsymbol{R}, \boldsymbol{\Delta}, \tilde{\boldsymbol{Y}}, \boldsymbol{Z} \rbrace\)` - Note that `\(\boldsymbol{\theta} = \lbrace \boldsymbol{\beta}_0, \boldsymbol{\beta}_1, \boldsymbol{\gamma}, \boldsymbol{\lambda}_k^{X0}, \boldsymbol{\lambda}_l^{X1}, \boldsymbol{\lambda}_m^{W}, \boldsymbol{h}_G \rbrace\)` - This leads to one single likelihood that can be used for fitting the model $$ p(\boldsymbol{D} | \boldsymbol{\theta})= \prod_{i=1}^I \int \int p(R_i, \Delta_i | \boldsymbol{\xi}_i, \boldsymbol{\zeta}_i, \boldsymbol{Z}_i, \boldsymbol{\theta})p(\boldsymbol{Y}_i|\boldsymbol{\xi}_i, \boldsymbol{\zeta}_i, \boldsymbol{\theta})p(\boldsymbol{\xi}_i|\boldsymbol{\theta})p(\boldsymbol{\zeta}_i|\boldsymbol{\theta})d\boldsymbol{\zeta}_i d\boldsymbol{\xi}_i $$ - Which does not have an explicit form --- # Estimation - In general, the route of estimation largely depends on the data at hand and the end goal - One simple way to go about this is to use a two stage approach: 1. Estimate the longitudinal model 2. Use estimates from (1) in the survival model - However this has been shown to lead to considerable bias when measurement error in the longitudinal observations are large - Many approaches use an EM algorithm adapted to the specific formulation of the joint model - Additionally, nonparametric approaches have also been developed - Other approaches have went Bayesian and developed MCMC algorithms - Given the complexity of the problem being solved here, the authors adopted a Bayesian approach --- # Bayesian Details: Priors **Unknown Parameters:** $$ `\begin{aligned} \boldsymbol{\beta}_0 \sim N(\boldsymbol{0}, \sigma^2_{\beta_0}I_{N_0}) \\ \boldsymbol{\beta}_1 \sim N(\boldsymbol{0}, \sigma^2_{\beta_1}I_{N_1}) \\ \boldsymbol{\gamma} \sim N(\boldsymbol{0}, \sigma^2_{\gamma}I_{q}) \end{aligned}` $$ - The `\(\sigma^2\)`s are taken to be large when prior knowledge is not available **Variance Components:** For `\(k = 1, \ldots, N_0\)`, `\(l = 1, \ldots, N_1\)`, `\(m = 1, \ldots, N_w\)`, $$ `\begin{aligned} \lambda_k^{X0} \sim IG(a_k^{X0}, b_k^{X0}) \\ \lambda_l^{X1} \sim IG(a_l^{X1}, b_l^{X1}) \\ \lambda_m^{W} \sim IG(a_m^{W}, b_m^{W}) \\ \end{aligned}` $$ - Empirical Bayes can be used to help choose hyperparameters - For example: `\(a_k^{X0} \leq 0.01\)`, `\(b_k^{X0} \leq 0.01\tilde{\lambda}_k^{X0}\)`, where `\(\tilde{\lambda}_k^{X0}\)` is the EB estimator --- # Bayesian Details: Priors (continued) **Hazard values:** - Here the approach taken to estimate `\(h_0(t)\)` is to treat it as a piecewise constant. For `\(g = 1, \ldots, G\)`, `$$h_g \sim Gamma(\alpha_{1g}, \alpha_{2g})$$` - With `\((\alpha_{1g}, \alpha_{2g})\)` often set to `\((0.2, 0.4)\)` or `\((0.5, 1)\)` to result in a flat prior. --- # Bayesian Details: Sampling - Sampling directly from the posterior is intractable - Mainly due to the existence of the latent variables `\(\boldsymbol{\xi}\)` and `\(\boldsymbol{\zeta}\)` - Instead sampling is done for `\(p(\boldsymbol{\theta}, \boldsymbol{\xi}, \boldsymbol{\zeta} | \boldsymbol{D})\)` - `\(\boldsymbol{\eta}_{ij}\)` can automatically be updated: `\(\boldsymbol{\eta}_{ij} = (\Psi^{W'}\Psi^{W})^{-1}\Psi^{W'}(\tilde{\boldsymbol{Y}}_{ij} - \Psi^{X0}\boldsymbol{\xi}_i - T_{ij}\Psi^{X1}\boldsymbol{\zeta}_i)\)` - The full conditionals for `\(\boldsymbol{\xi}_i\)` and `\(\boldsymbol{\zeta}_i\)` (jointly) and `\(\boldsymbol{\beta}_0\)`, `\(\boldsymbol{\beta}_1\)`, and `\(\boldsymbol{\gamma}\)` (jointly) are a bit unwieldy and don't follow a known distribution. - Require Metropolis Hastings with a multivariate normal proposal - Mean at the current value of the parameters of interest - Covariance proportional to the inverse of the fisher information of the posterior - Use a small variance parameter to tune acceptance rate to be approximately 35% - The variances of the eigenscores and `\(h_g\)`s can be sampled from using Gibbs sampling - The full conditionals for the variances ( `\(\lambda_k^{X0}\)`, `\(\lambda_l^{X1}\)`, `\(\lambda_m^{W}\)` ) are Inverse-Gamma - The full conditionals for `\(h_g\)`s are Gamma --- # Bayesian Details: Prediction - Given a set of longitudinal images and baseline covariates we want to predict survival probability - Conditional on surviving up to time `\(t\)`, what is the conditional probability of survival up to time `\(t +\delta\)`? - Notationally this is `\(P(T_i^* \geq t + \delta | T_i^* > t, Y_i^t, \boldsymbol{Z}_i, \boldsymbol{D})\)` - Thus, we can compute the above probability with the integrand: `\(\int P(T_i^* \geq t + \delta | T_i^* > t, Y_i^t, \boldsymbol{Z}_i, \boldsymbol{\theta})p(\boldsymbol{\theta}|\boldsymbol{D})d\boldsymbol{\theta}\)` - The authors show this can be estimated using Monte Carlo integration - Predictive ability can be assessed using a time dependent AUC - AUC(t, `\(\delta\)`) = `\(P(\pi_{i_1}(t, \delta) < \pi_{i_2}(t, \delta) | \lbrace T^*_{i_1} \in (t, t + \delta] \rbrace \cap \lbrace T^*_{i_2} > t+ \delta \rbrace)\)` --- # Data Analysis - Methods - Applied proposed method to ADNI dataset. - Only considered subjects who suffered from MCI at baseline and those with at least 3 longitudinal MRI scans: 339 subjects who had 3-6 follow-up visits (baseline, month 6, month 12, month 18, month 24, and month 36). - 127 subjects experienced conversion from MCI to AD. - Time to event was calculated as time from baseline to date of first diagnosis of AD for those that experience conversion and time from baseline to date of last visit for all others. - Censoring rate was approximately 62%. - MRI data were preprocessed to a dimension of 133 × 170 × 129. - Considered demographic variables such as gender and whether or not the subject was married. - Considered the APOE gene biomarker. Included indicators for one APOE allele carrier and two APOE alleles carrier. --- # Data Analysis - Methods - Used BIC to determine `\(𝑁_0,𝑁_1,\)` and `\(𝑁_𝑤\)` and assumed `\(𝑁_0=𝑁_1=𝑁_𝑤=𝑁\)`. BIC selected `\(𝑁=7\)`. Thus, the joint model with the first seven eigenimages was chosen for the analysis. - For the baseline hazard function they chose `\(𝐺=5\)` and selected the cut points as the quantiles of the observed survival time. - Used flat priors for the hyperparamters. - Used MCMC algorithm to collect 10,000 posterior samples after 10,000 burn-in iterations. - 5.03 minutes of computation time. --- # Data Analysis - Results <img src="data:image/png;base64,#images/fig4.jpg" width="45%" height="70%" style="float:right; padding:10px" /> - For the imaging predictors: - The 1st and 5th eigenimages of random imaging intercept and the 1st eigenimage of random imaging slope exhibit significant effects on AD hazards. - Several brain regions are detected to be highly associated with AD hazards: - The positive effect of “lateral ventricles” (the top row of figure) is evident, implying that the enlargement of the lateral ventricles is positively associated with the development of AD. Consistent with previous studies. - In the brain regions that depict negative effects, the magnitudes of the effects of “inferior parietal lobe” ( the second row of figure) and “inferior temporal gyrus and fusiform gyrus” (third row of figure) are relatively large. This is in line with existing research which indicate that AD is negatively correlated to the volume/thickness of these regions. --- # Data Analysis - Results - For the baseline covariates: - Gender had a significant negative effect on AD hazards, indicating that females have a higher risk of suffering from AD than males. - Surprisingly the APOE gene which has been found to be an important risk factor in previous research had an insignificant effect in this study. - One possible explanation is that APOE gene is somehow a confounding variable between certain brain regions and the hazards of AD. --- # Conclusions - Useful method for incorporating ultrahigh-dimensional longitudinal data in the joint modeling process. - Limitations/future work: - Assumes that `\(\beta_0(v)\)` and `\(\beta_1(v)\)` can be fully represented by the eigenfunctions of `\(X_{i0}(v)\)` and `\(X_{i1}(v)\)`. - Unfolding the 3D MRI images and treat them as a one-dimensional function may destroy the underlying spatial structure of the image. - The proposed joint modeling framework can be modified to handle informative dropout. --- class: questions Questions? --- # References **Paper Covered:** Kai Kang & Xin Yuan Song (2022): Joint Modeling of Longitudinal Imaging and Survival Data, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2102027 **Other Sources:** Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000 Dec;1(4):465-80. doi: 10.1093/biostatistics/1.4.465. PMID: 12933568. Tsiatis A, Davidian M. Joint Modeling of Longitudinal and Time-to-Event Data: An Overview. Statistica Sinica. 2004; 14: 809-834. https://www.jstor.org/stable/24307417 --- # Spectral Decomposition - Spectral decomposition of `\(\mathrm{\bf{K}}^{X0}, \mathrm{\bf{K}}^{X1}, \text{ and } \mathrm{\bf{K}}^{W}\)` needs to be constructed to derive the eigenimages shown in the model. - Start by centering the longitudinal imaging data: - `\(\tilde{Y}_{ij}(v) = Y_{ij}(v) - \hat{\mu(v)}\)`, where `\(\hat{\mu(v)} = \frac{1}{n} \sum^I_{i=1} \sum^{J_i}_{j=1}Y_{ij}(v)\)`, `\(n = \sum^I_{i=1}J_i\)`, `\(I\)` is the total number of subjects, and `\(J_i\)` is the total number of visits for subject i. - However, we run into a problem due to `\(\tilde{Y}_{ij}(v)\)` being ultrahigh dimensional, making standard approaches for the spectral decomposition of `\(\mathrm{\bf{K}}^{X0}, \mathrm{\bf{K}}^{X1}, \text{ and } \mathrm{\bf{K}}^{W}\)` unfeasible. - For example, with the MRI images of 133 × 170 × 129 ( `\(p\)`=2,916,690) the resulting covaraince operator `\(\hat{\mathrm{\bf{K}}}^{X0}\)` is of dimension 2,916,690 `\(\times\)` 2,916,690. - To overcome this challenge, use singular value decomposition (SVD): - Let `\(\tilde{\mathrm{\bf{Y}}} = (\tilde{\mathrm{\bf{Y}}}_1, ..., \tilde{\mathrm{\bf{Y}}}_I)_{p \times n}\)`, where `\(\tilde{\mathrm{\bf{Y}}}_i = (\tilde{\mathrm{\bf{Y}}}_{i1},...\tilde{\mathrm{\bf{Y}}}_{iJ_i})\)` is the centralized `\(p \times J_i\)` and column `\(j\)` for `\(j = 1,...,J_i\)` contains the unfolded images for subject i at visit j - Construct the singular value decomposition (SVD) of the matrix `\(\tilde{\mathrm{\bf{Y}}}\)` to get the best low-dimension approximation of the matrix which subsequently allows us to obtain a low-dimension representations of the covariance operator `\(\hat{\mathrm{\bf{K}}}^{X0}, \hat{\mathrm{\bf{K}}}^{X1}, \text{ and } \hat{\mathrm{\bf{K}}}^{W}\)`. --- # Prediction performance <img src="data:image/png;base64,#images/fig5.jpg" width="100%" style="display: block; margin: auto;" /> --- # Simulation Summary: Simulation 1 Setup - Simulate 100 datasets with images of `\(50 \times 50 \times 50\)` and with `\(\mu(v) = 0\)`. - In these simulations they use two different sample sizes: 300 and 500 - Generate data with each subject having 7, 8, or 9 scans over time - Two principal components are used for the intercept and slope, and four are used for the random subject-visit deviation - They also considered three baseline hazards (constant, linear, non-linear) and a range of censoring rates - Evaluated Bias, RMSE, and SE --- # Simulation Summary: Simulation 1 Results - Simulation 1 showed that the estimation procedure was satisfactory for estimating `\(\boldsymbol{\beta}_0, \boldsymbol{\beta}_1\)`, and `\(\boldsymbol{\gamma}\)` as well as the estimates for the variances of the eigenscores. - They also found that performance is improved as the sample size is increased or censoring rate is decreased - Relatively consistent across different baseline hazards - Choice of `\(N_W\)` hardly effects estimation of survival model - Use of BIC to select `\(N_0\)` and `\(N_1\)`, (where `\(N_0\)` = `\(N_1\)`), correctly selected 2 in 99 of the 100 simulations. - Each replication took about 4 and a half minutes. --- # Simulation Summary: Simulation 2 - Similar setup as simulation 1, but focused on the comparison to the two-stage approach - Used a constant baseline hazard. - They also considered larger eigenscores for the subject-visit variation (1, 0.75, 0.5, 0.25) - Found that the two stage approach tends to produce biased results that does not improve with an increase in sample size - Worsens with an increase in the subject-visit variation as well --- # Other Assumptions - The effect of the longitudinal data in the time-to-event model should be left-continuous - Ideally, in the absence of an association, the joint modeling process should yield models as if the two components were modeled separately - The hazard, conditional on the longitudinal history and covariates, is associated with the rate of change of the smoothed underlying trend of the images - The random effects are independent of the covariates - The random effects are assumed normally distributed - Censoring, timing of longitudinal measurements, and measurement process depend only on the observable history and latent random effects --- # Tsiatis and Davidian Specification - Following the notation for this presentation - Slightly different longitudinal modeling: `$$Y_{ij}=f(t)^T\boldsymbol{X}_{i}+W_{ij}$$` - Same hazrd: $$ h(t|\boldsymbol{X}_i(t), Z_i) = h_0(t)exp \Big[\boldsymbol{X}_i(t)\boldsymbol{\beta} + Z_i \gamma \Big] $$ Where `\(\boldsymbol{X}_i(t) = f(t)^T\boldsymbol{X}_{i}\)` - More general in terms of the random effects being a function of time - No term for the mean trajectory (this does show up in Henderson, Diggle, and Dobson (2000))